Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem, which is the ever-changing targets at every iteration when multiple agents update their policies at the same time. Starting from first principle, in this paper, we manage to solve the non-stationarity problem by proposing bidirectional action-dependent Q-learning (ACE). Central to the development of ACE is the sequential decision-making process wherein only one agent is allowed to take action at one time. Within this process, each agent maximizes its value function given the actions taken by the preceding agents at the inference stage. In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action. Given the design of bidirectional dependency, ACE effectively turns a multiagent MDP into a single-agent MDP. We implement the ACE framework by identifying the proper network representation to formulate the action dependency, so that the sequential decision process is computed implicitly in one forward pass. To validate ACE, we compare it with strong baselines on two MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE achieves 100% success rate on almost all the hard and super-hard maps. We further study extensive research problems regarding ACE, including extension, generalization, and practicability. Code is made available to facilitate further research.
translated by 谷歌翻译
Federated learning (FL) enables the building of robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.
translated by 谷歌翻译
深神经网络(DNNS)在各种机器学习(ML)应用程序中取得了巨大成功,在计算机视觉,自然语言处理和虚拟现实等中提供了高质量的推理解决方案。但是,基于DNN的ML应用程序也带来计算和存储要求的增加了很多,对于具有有限的计算/存储资源,紧张的功率预算和较小形式的嵌入式系统而言,这尤其具有挑战性。挑战还来自各种特定应用的要求,包括实时响应,高通量性能和可靠的推理准确性。为了应对这些挑战,我们介绍了一系列有效的设计方法,包括有效的ML模型设计,定制的硬件加速器设计以及硬件/软件共同设计策略,以启用嵌入式系统上有效的ML应用程序。
translated by 谷歌翻译
近年来,人们对少量知识图(FKGC)的兴趣日益增加,该图表旨在推断出关于该关系的一些参考三元组,从而推断出不见了的查询三倍。现有FKGC方法的主要重点在于学习关系表示,可以反映查询和参考三元组共享的共同信息。为此,这些方法从头部和尾部实体的直接邻居中学习实体对表示,然后汇总参考实体对的表示。但是,只有从直接邻居那里学到的实体对代表可能具有较低的表现力,当参与实体稀疏直接邻居或与其他实体共享一个共同的当地社区。此外,仅仅对头部和尾部实体的语义信息进行建模不足以准确推断其关系信息,尤其是当它们具有多个关系时。为了解决这些问题,我们提出了一个特定于关系的上下文学习(RSCL)框架,该框架利用了三元组的图形上下文,以学习全球和本地关系特定的表示形式,以使其几乎没有相关关系。具体而言,我们首先提取每个三倍的图形上下文,这可以提供长期实体关系依赖性。为了编码提取的图形上下文,我们提出了一个分层注意网络,以捕获三元组的上下文信息并突出显示实体的有价值的本地邻里信息。最后,我们设计了一个混合注意聚合器,以评估全球和本地级别的查询三元组的可能性。两个公共数据集的实验结果表明,RSCL的表现优于最先进的FKGC方法。
translated by 谷歌翻译
随着AI民主化的进展,机器学习(ML)已成功应用于边缘应用,如智能手机和自动驾驶。如今,更多的应用需要在具有极其有限的资源的微小设备上ML,如植入式心脏除颤器(ICD),其称为Tinym1。与边缘上的ML不同,有限的能量供应的Tinyml对低功率执行的需求较高。随机计算(SC)对数据表示的比特流是有价值的,因为它可以使用简单的逻辑门来执行基本的ML操作,而不是复杂的二进制加法器和乘法器。然而,由于算术单元的低数据精度和不准确性,SC通常遭受ML任务的低精度。增加现有作品中的比特流的长度可以减轻精度问题,但延迟较高。在这项工作中,我们提出了一种新的SC架构,即基于块的随机计算(BSC)。 BSC将输入划分为块,使得通过利用高数据并行性可以减少延迟。此外,提出了优化的算术单元和输出修订(我们)方案以提高精度。在它之上,设计了全局优化方法来确定块的数量,可以提高延迟功率折衷。实验结果表明,BSC可以优于现有的设计,以实现ML任务的高度超过10%,并且减少超过6倍。
translated by 谷歌翻译
通过嵌入式表示知识图(KGE)近年来一直是研究热点。现实知识图主要与时间相关,而大多数现有的KGE算法忽略了时间信息。一些现有方法直接或间接编码时间信息,忽略时间戳分布的平衡,这大大限制了时间知识图完成的性能(KGC)。在本文中,基于直接编码时间信息框架提出了一种时间KGC方法,并且给定的时间片被视为用于平衡时间戳分布的最优选的粒度。大量关于从现实世界提取的时间知识图形数据集的实验证明了我们方法的有效性。
translated by 谷歌翻译
增强学习(RL)中的泛化差距是一种重要的障碍,防止RL代理学习普通技能并适应不同的环境。增加RL系统的泛化容量可以显着提高其在现实世界工作环境中的性能。在这项工作中,我们提出了一种新的策略感知的逆势数据增强方法,以增加自动生成的轨迹数据的标准策略学习方法。与基于常用的观察转换的数据增强不同,我们提出的方法基于策略梯度目标来对外生成新的轨迹数据,并旨在更有效地提高RL代理商的泛化能力,通过策略感知数据增强。此外,我们进一步部署了混合步骤以集成原始和生成的数据,以增强泛化容量,同时减轻对抗数据的过度偏差。我们通过将其与标准基线和最先进的MixREG方法进行比较来研究许多RL任务的实验,以研究所提出的方法的泛化性能。结果表明我们的方法可以通过有限的培训多样性概括,实现最先进的泛化测试性能。
translated by 谷歌翻译
最近,我们看到了基于深神经网络(DNN)的视觉跟踪解决方案的快速发展。一些跟踪器将基于DNN的解决方案与判别相关滤波器(DCF)相结合,以提取语义特征并成功地提供最新的跟踪准确性。但是,这些解决方案是高度计算密集型的,需要长时间处理时间,从而导致无抵押的实时性能。为了提供高精度和可靠的实时性能,我们提出了一个名为Siamvgg的新颖跟踪器。它结合了卷积神经网络(CNN)主链和互相关操作员,并利用示例图像中的功能以进行更准确的对象跟踪。 Siamvgg的体系结构是根据VGG-16自定义的,其参数由示例性图像和所需的输入视频帧共享。我们在OTB-2013/50/100和Dot 2015/2016/2017数据集中证明了拟议的暹罗,具有STATE-ORT-TEA-ART精度,同时保持在GTX 1080TI上运行的50 FPS的体面实时性能。与Dot2017挑战中的ECO和C-COT相比,我们的设计可以实现预期平均重叠(EAO)的预期平均重叠(EAO)。
translated by 谷歌翻译
We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present highquality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF CC 50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the Shang-haiTech Part B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-theart method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-ofthe-art approach.
translated by 谷歌翻译
There are many artificial intelligence algorithms for autonomous driving, but directly installing these algorithms on vehicles is unrealistic and expensive. At the same time, many of these algorithms need an environment to train and optimize. Simulation is a valuable and meaningful solution with training and testing functions, and it can say that simulation is a critical link in the autonomous driving world. There are also many different applications or systems of simulation from companies or academies such as SVL and Carla. These simulators flaunt that they have the closest real-world simulation, but their environment objects, such as pedestrians and other vehicles around the agent-vehicle, are already fixed programmed. They can only move along the pre-setting trajectory, or random numbers determine their movements. What is the situation when all environmental objects are also installed by Artificial Intelligence, or their behaviors are like real people or natural reactions of other drivers? This problem is a blind spot for most of the simulation applications, or these applications cannot be easy to solve this problem. The Neurorobotics Platform from the TUM team of Prof. Alois Knoll has the idea about "Engines" and "Transceiver Functions" to solve the multi-agents problem. This report will start with a little research on the Neurorobotics Platform and analyze the potential and possibility of developing a new simulator to achieve the true real-world simulation goal. Then based on the NRP-Core Platform, this initial development aims to construct an initial demo experiment. The consist of this report starts with the basic knowledge of NRP-Core and its installation, then focus on the explanation of the necessary components for a simulation experiment, at last, about the details of constructions for the autonomous driving system, which is integrated object detection and autonomous control.
translated by 谷歌翻译